Search Result

Select

Pedestrian trajectory prediction based on multi-head soft attention graph convolutional network

Tao PENG, Yalong KANG, Feng YU, Zili ZHANG, Junping LIU, Xinrong HU, Ruhan HE, Li LI

Journal of Computer Applications 2023, 43 (3): 736-743. DOI: 10.11772/j.issn.1001-9081.2022020207

Abstract （359）

HTML （15）

PDF （5673KB）（172）

PDF（mobile）（2752KB）（31）

Save

The complexity of pedestrian interaction is a challenge for pedestrian trajectory prediction， and the existing algorithms are difficult to capture meaningful interaction information between pedestrians， which cannot intuitively model the interaction between pedestrians. To address this problem， a multi-head soft attention graph convolutional network was proposed. Firstly， a Multi-head Soft ATTention （MS ATT） combined with involution network was used to extract sparse spatial adjacency matrix and sparse temporal adjacency matrix from spatial and temporal graph inputs respectively to generate sparse spatial directed graph and sparse temporal directed graph. Then， a Graph Convolutional Network （GCN） was used to learn interaction and motion trend features from sparse spatial and sparse temporal directed graphs. Finally， the learned trajectory features were input into a Temporal Convolutional Network （TCN） to predict double Gaussian distribution parameters， thereby generating the predicted pedestrian trajectories. Experiments on Eidgenossische Technische Hochschule （ETH） and University of CYprus （UCY） datasets show that， compared with Space-time sOcial relationship pooling pedestrian trajectory Prediction Model （SOPM）， the proposed algorithm reduces the Average Displacement Error （ADE） by 2.78%， and compared to Sparse Graph Convolution Network （SGCN）， the proposed algorithm reduces the Final Displacement Error （FDE） by 16.92%.

Table and Figures | Reference | Related Articles | Metrics

Select

Cascaded cross-domain feature fusion for virtual try-on

Xinrong HU, Junyu ZHANG, Tao PENG, Junping LIU, Ruhan HE, Kai HE

Journal of Computer Applications 2022, 42 (4): 1269-1274. DOI: 10.11772/j.issn.1001-9081.2021071274

Abstract （240）

HTML （5）

PDF （1058KB）（74）

Save

The virtual try-on technologies based on image synthesis mask strategy can better retain details of the clothing when the warped clothing is fused with the human body. However， because the position and structure of the human body and the clothing are difficult to align during the try-on process， the try-on result is likely to produce severe occlusion， affecting visual effect. In order to solve the occlusion in the try-on process， a U-Net based generator was proposed. In the generator， a cascaded spatial attention module and a channel attention module were added to the U-Net decoder， thereby achieving the cross-domain fusion between local features of warped clothes and global features of the human body. Formally， first， by predicting the Thin Plate Spline （TPS） conversion using the convolutional network， the clothing was distorted according to the target human body pose. Then， the dressed-on person representation information and the warped clothing were input into the proposed generator， and the mask image of the corresponding clothing area was obtained to render the intermediate result. Finally， the strategy of mask synthesis was used to synthesize the warped clothing with the intermediate result through mask processing to obtain the final try-on result. Experimental results show that the proposed method can not only reduce occlusion， but also enhance image details. Compared with Characteristic-Preserving Virtual Try-On Network （CP-VTON） method， the proposed method has the generated image with the average Peak Signal-to-Noise Ratio （PSNR） increased by 10.47%， the average Fréchet Inception Distance （FID） decreased by 47.28%， and the average Structural SIMilarity （SSIM） increased by 4.16%.

Table and Figures | Reference | Related Articles | Metrics

Select

Fast depth video coding algorithm based on region division

TIAN Tao PENG Zongju

Journal of Computer Applications 2013, 33 (06): 1706-1710. DOI: 10.3724/SP.J.1087.2013.01706

Abstract （728）

PDF （750KB）（637）

Save

As the main scheme of 3D scene representation, multiview video plus depth attracts more and more attention. Depth video reflects the geometric information of the scene. It is important to design fast depth video encoding algorithm. A fast depth video coding algorithm based on region division was proposed. Firstly, the depth video was divided into four regions according to the features of edge and motion. Then, macroblock distribution proportion and multi-reference frame selection feature of different regions were analyzed. Consequently, different macroblock mode decision and reference frame selection methods were utilized to speedup depth video encoding. Finally, some experiments were conducted to estimate the proposed algorithm in terms of encoding time, bit rate and virtual view quality. Experimental results show that the proposed algorithm saves encoding time ranging from 85.73% to 91.06% while it maintains virtual view quality and bit rate.

Reference | Related Articles | Metrics